The file titled US Electricity.csv includes a time series index compiled by the US Federal Reserve representing total fossil-fuel US electricity generation by all utilities from January 1939 through October 2021.
In the following code box we read the CSV file and set up the data as a tsibble and then we plot it and subset it to examine it.
#clear environment
rm(list = ls())
library(fpp3)
D <- read.csv("US Electricity.csv") %>%
mutate(DATE = yearmonth(DATE)) %>%
as_tsibble(index = DATE)
D %>% autoplot(ELEC)
DR <- D %>% filter(DATE >= yearmonth("2010 Jan"))
DR %>% autoplot(ELEC)
We are interested in developing a two-year long monthly forecast (24 months) for the national electricity production requirements.
#stationarity of DR - not stationary, need to add seasonality
#ACF and PACF
DR %>% gg_tsdisplay(ELEC, plot_type = "partial") +
labs(title="DR data", y="")
#add seasonality
DR.12 <- DR %>% gg_tsdisplay(difference(ELEC, 12),
plot_type='partial', lag=60) +
labs(title="Seasonally differenced - 12", y="")
DR.12
Warning: Removed 12 row(s) containing missing values (geom_path).
Warning: Removed 12 rows containing missing values (geom_point).
DR.4 <- DR %>% gg_tsdisplay(difference(ELEC, 4),
plot_type='partial', lag=60) +
labs(title="Seasonally differenced - 4", y="")
DR.4
Warning: Removed 4 row(s) containing missing values (geom_path).
Warning: Removed 4 rows containing missing values (geom_point).
#DR.d %>% ACF(ELEC) %>%
#autoplot() + labs(title = 'Electricity 2010 - 2020')
#DR.d %>% autoplot(.vars = diff2.E)
#DR.d %>% gg_tsdisplay(ELEC, plot_type = "partial")
DR is not stationary, all of the lags are significant lags in the ACF plot. We need to figure out how many differences to take until it is stationary. PACF shows only 2 very large significant lags at the beginning. Because the ACF is complex, and the PACF dies down slowly, I believe this a Moving Average forecasting model would best suite this data. It also looks like the ACF is showing some seasonality in the lags, as there is a pattern to the spikes. This makes sense as your electricity usage changes as the weather changes.
The PACF dies down so we will identify this model as a MA(1) or MA(2) seeing if seasonal MA or regular MA results in a better arima model. I think it could possibly be seasonality 4, rather than seasonality 12 as well. 1) ARIMA(1,1,1)(1,1,1) [12] 2) ARIMA(1,1,2)(1,1,1) [12] 3) ARIMA(1,1,1)(1,1,2) [12]
#how many differences it needs
DR %>% features(ELEC, unitroot_nsdiffs) #seasonal differences needed = 1
DR %>% features(ELEC,unitroot_ndiffs) #differences needed = 0
fitDR <- DR %>%
model( #none of the models worked with d = 0
arima1 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,1)),
arima2 = ARIMA(ELEC ~ pdq(1,1,2) + PDQ(1,1,1)),
arima3 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,2)),
auto.arima = ARIMA(ELEC), #100210
auto.ETS = ETS(ELEC)) #error("M") + trend("N") + season("A")
fitDR %>% pivot_longer(everything(), names_to = "Model name",
values_to = "Orders")
glance(fitDR) %>% arrange(AICc) %>% select(.model:BIC)
fitDR %>% augment() %>%
features(.resid, ljung_box, lag = 60)
fitDR %>% select(arima1) %>% gg_tsresiduals()
fitDR %>% select(arima2) %>% gg_tsresiduals()
fitDR %>% select(arima3) %>% gg_tsresiduals()
fitDR %>% select(auto.arima) %>% gg_tsresiduals()
fitDR %>% select(auto.ETS) %>% gg_tsresiduals()
For model cross-validation purposes stretch the DR data as follows:
fitDR %>% accuracy() %>% select(.model, MAPE, RMSE, MAE)
fitDR %>% glance() %>%
select(.model, AIC, AICc, BIC)
Best 2 arima Models = arima3 ARIMA(1,1,1)(1,1,2) [12] and arima1 ARIMA(1,1,1)(1,1,1) [12] ETS model = Elec ~ error(“M”) + trend(“N”) + season(“A”)
#kinda takes awhile
#given code
DR.CV <- DR %>%
filter(DATE >= yearmonth("2010 Jan")) %>%
stretch_tsibble(.init = 36, .step = 1)
mC <- DR.CV %>%
model(
arima1 = ARIMA(ELEC ~ pdq(0,1,1) + PDQ(0,1,1)),
arima2 = ARIMA(ELEC ~ pdq(0,1,2) + PDQ(0,1,1)),
arima3 = ARIMA(ELEC ~ pdq(0,1,2) + PDQ(0,1,2)),
ETS(ELEC ~ error("M") + trend("N") + season("A")))
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning: 36 errors (1 unique) encountered for arima3
[36] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
mC %>%
forecast(h = 4) %>% #each model ID you have 4 forecasts
group_by(.id, .model) %>% #ID of one corresponds to 3 models
mutate(h = row_number()) %>% #create variable h for the .id #
ungroup() -> fCV #forecast cross validation
fCV
fCV <- mC %>%
forecast(h = 24) %>% #for each of the 4 model IDs you have 24 forecasts
group_by(.id, .model) %>% #ID of one corresponds to 3 models
mutate(h = row_number()) %>% #create variable h for the .id #
ungroup() #forecast cross validation
fCV %>%
accuracy(D, by = c("h", ".model")) %>% #shows you all the CV metrics
ggplot(aes(x = h, y = MAPE, color = .model)) +
geom_line()
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep
fCV %>%
accuracy(D, by = c("h", ".model")) %>%
ggplot(aes(x = h, y = RMSE, color = .model)) +
geom_line()
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep
DR.CV <- DR %>%
filter(DATE >= yearmonth("2010 Jan")) %>%
stretch_tsibble(.init = 36, .step = 1)
DR.CV %>%
model(
ARIMA(ELEC ~ pdq(0,1,2) + PDQ(0,1,2)),
ETS(ELEC ~ error("M") + trend("N") + season("A"))) %>%
forecast(h = 24) %>%
accuracy(DR) %>%
select(.model, RMSE:MAPE)
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning: 36 errors (1 unique) encountered for ARIMA(ELEC ~ pdq(0, 1, 2) + PDQ(0, 1, 2))
[36] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep